Guidelines (read carefully before starting)ΒΆ
Objective: This practical session consists in a series of 6 short exercises introducting the time-frequency analysis of sounds (acoustic signals).
Guidelines: after retrieving the resources for the lab on moodle:
- place the .zip archive in a local folder (Computer -> Documents/Python/);
- unzip the archive .zip;
- rename the folder with the convention lab1_Name1_Name2;
- duplicate the notebook file and rename it lab1_Name1_Name2.ipynb;
- at the end of the session, do not forget to transfer your work to your own network space if you are working on a machine from the school (do not leave your work on the C: drive).
Assessment β global grade from F to A (A+)
Assessmment based on your answer to the exercises reported in the notebook and any additional .py file produced. Custom code should be commented whenever appropriate. Figures should be clearly annotated (axes, title).
- Numerical correctness
- Implementation clarity (documentation, relevance of the comments)
- Answers to the questions and overall presentation of the Jupyter notebook.
Remark: some of the Python modules used in this lab session are:
- the
numpy.fftpackage, providing implementation of the standard Fourier transforms and related tools; - the
scipy.signalpackage, containing several functions to perform various operations on signals
ConfigurationΒΆ
# make sure the notebook reloads the module each time we modify it
%load_ext autoreload
%autoreload 2
# Uncomment the next line if you want to be able to zoom on plots (one of the options below)
# %matplotlib widget
# %matplotlib notebook
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy.io as scio
import scipy.io.wavfile as scwav
import scipy.signal as sg
from IPython.display import Audio
from ipywidgets import fixed, interact, interact_manual, interactive
from matplotlib.colors import LogNorm
from matplotlib.gridspec import GridSpec
SMALL_SIZE = 16
MEDIUM_SIZE = 20
BIGGER_SIZE = 24
plt.rc("font", size=SMALL_SIZE) # controls default text sizes
plt.rc("axes", titlesize=SMALL_SIZE) # fontsize of the axes title
plt.rc("axes", labelsize=MEDIUM_SIZE) # fontsize of the x and y labels
plt.rc("xtick", labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc("ytick", labelsize=SMALL_SIZE) # fontsize of the tick labels
plt.rc("legend", fontsize=SMALL_SIZE) # legend fontsize
plt.rc("figure", titlesize=BIGGER_SIZE) # fontsize of the figure title
Reading, displaying, listening to a signalΒΆ
Signals can be represented as 1D vectors, stored as a row or a column vector. Several sound files signals are available in the sounds folder.
The following instruction reads a signal from a .wav file (song of a bird), with fs the sampling frequency loaded from the file.
fs, x = scwav.read("sounds/bird.wav")
Audio(x, rate=fs)
# time samples corresponding to x
t = np.arange(0, x.size) / fs
plt.figure(figsize=(10, 6))
plt.plot(t, x)
plt.grid()
plt.xlabel("Time [s]")
plt.ylabel("Signal intensity")
plt.title("Bird signal")
plt.show()
- One can visualize a small segment of the signal in the time domain and observe its oscillatory behaviour.
# first instant in the considered time interval
t0 = 0.18
# time interval duration
T = 0.02
# time instants within the time interval considered
t = np.linspace(t0, t0 + T, int(fs * T))
plt.figure(figsize=(12, 6))
plt.plot(t, x[int(fs * t0) : int(fs * (t0 + T))])
plt.xlabel(r"$t$ [s]", fontsize=18)
plt.ylabel(r"$x(t)$", fontsize=18)
plt.show()
- For the rest of the lab, we consider a limited portion of the signal, composed of
Nsamples.
N = 2**15 # 32768 samples
x = x[:N]
plt.figure()
plt.plot(x)
#plt.plot(t[:N], x)
plt.grid()
plt.xlabel("Time [s]")
plt.ylabel("Amplitude")
plt.title("Bird signal")
plt.show()
Fourier transform of the signalΒΆ
One can first compute the discrete Fourier transform of the signal and display its modulus to have an idea about the distribution of its energy in the frequency space (remember that $S_x(\nu) = |X(\nu)|^2$ for continous-time signal of finite energy, with $S_x$ the energy spectral density of $x$).
The frequencies at which the discrete Fourier transform is computed go from $0$ to $(N-1)\frac{f_s}{N}$ (with $n$ the number of signal samples), with a regular frequency step $\frac{f_s}{N}$.
Fx = np.fft.fft(x)
# list of frequencies associated with the DFT
freq = np.arange(0, N) * (fs / N)
plt.figure()
plt.plot(freq, np.abs(Fx))
plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|Fx|$")
plt.grid()
plt.show()
For a better readability and to highlight the Hermitian symmetry of the Fourier transform of real signals, the np.fft.fftshift function centers frequencies around $0$, between $ -\frac{f_s}{2} $ and $ (N-1)\frac{f_s}{2N}$.
One can use the np.fft.fftfreq function to automatically access the associated list of frequencies.
Remark: note that the
np.fft.fftfreqfunction produces frequencies from 0 to $f_s/2$ followed by frequencies from $-f_s/2$ to $0$.
Fx_shift = np.fft.fftshift(Fx)
freq = np.fft.fftshift(np.fft.fftfreq(N, d=1.0 / fs))
plt.figure()
plt.plot(freq, abs(Fx_shift))
plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|Fx|$")
plt.grid()
plt.show()
Answers to Exercise 1ΒΆ
fscorresponds to the frequency at which the signal has been sampled (44100 samples per second). This frequency is often used to sample analog audio. This standard frequency was chosen because it respects the Shannon criteria for human hearing, as 44.1kHz is larger than 2 times the maximum frequency of the audible human range (20~20kHz).
print(fs)
44100
- Most of the energy of the signal is concentrated in 2 frequency band peaks : the band between around 4kHz and 7kHz and the one between around 7.5kHz and 10kHz Hertz. Moreover, some of the energy of the signal is concentrated at 0 Hz frequency peak, which represents the non-zero average value of the signal.
The Fourier transform only gives access to a global information on the frequency content of the signal. A "local" Fourier analysis would be useful to have a more precise description of the sound, such as local amplitude and local frequency variations (local in the temporal domain).
- In this section, we will compute the Fourier transform of various segments of the signal, centered around time instants
tk, and represent its modulus to illustrate the variations of the local frequency content of the signal across time. Detailed steps are reported at the end of this section. - We will consider windowed segments of the signal
w*x, withwa Gaussian window centered intkcontaining $2M+1$ samples, with $M=400$ and standard deviation $\sigma = 200$.
Indication : to build a Gaussian window, use the function
scipy.signal.get_window. Useful code examples are given below.
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
n0 = 10000
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
# generating a Gaussian time-window with variance sigma
sigma = 200
gauss_win = sg.get_window(("gaussian", sigma), L)
Displaying the window as a function of timeΒΆ
plt.figure()
plt.plot(t[sel], gauss_win)
plt.xlabel("Time [s]")
plt.ylabel("Window amplitude")
plt.grid()
plt.show()
Extraction of a windowed signal segment for local Fourier analysisΒΆ
Segment extraction and multiplication with the observation window centered around the sample $n_0$ (corresponding time insntant $t_0 = (n_0-1)/f_s$).
x_segment = gauss_win * x[sel]
plt.figure()
plt.plot(t[sel], x_segment)
plt.xlabel("Time [s]")
plt.ylabel("Amplitude of $wx$")
plt.title("Signal segment around $t_0$={0:1.3f} s".format(t[n0]))
plt.grid()
plt.show()
Amplitude of the DFT coefficients of the windowed time-segmentΒΆ
Fx_segment = np.fft.fftshift(np.fft.fft(x_segment))
freq = np.fft.fftshift(np.fft.fftfreq(Fx_segment.size, d=1.0 / fs))
plt.figure()
plt.plot(freq, np.abs(Fx_segment))
plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|F(wx)|$")
plt.title("Spectrum of the time segment around $t_0$={0:1.3f} s".format(t[n0]))
plt.grid()
plt.show()
Adapting the code above, compute the Fourier transform of a segment of the signal around different values of
t0(corresponding to the time samplen0 = t0*fs), forn0= 1000 to 10000 in steps of 1000. Use a Gaussian window of width $ 2M + 1 $ samples, with $ M = 400 $ and $ \sigma = 200 $.Observe the results obtained and comment.
What does the segmented analysis highlight?
Display the Gaussian window for different values of $\sigma$ ($=50, 100$ et $200$ for instance) as well as its product with the local segment of signal
x[sel]Β around the sample $n_0 = 10000$.For each value of $\sigma$ (width of the Gaussian window), compute the Fourier transform of the window. Observe & comment.
Compute the Fourier transform of the segment of signal around the sample $n_0 = 10000$ obtained in question 4. (i.e., for the different values of $\sigma$). Observe and comment.
Why do we multiply the segment of signal with a Gaussian (or other) window before computing its Fourier transform ?
Indication: Compare the modulus of the DFT obtained with and without the Gaussian window. Focus on a few frequencies for the comparison.
For $\sigma=200$, write a
forloop to repeat the previous analysis on one or two segments at various instants $t_k = n_k / f_s$, for $n_k=10000$ to $n_k=20000$ with a step of $5000$.Using the code below, represent the sum of all these spectra altogether on one graph. What do you notice in comparison with the full spectrum of the signal $x$ displayed earlier?
Answers to Exercise 2ΒΆ
# Exercise 2.1
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
# creates an array from with values 1k, 2k, 3k, ... , 10k
n0_array = [1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000]
# iterates over the values in n0
for n0 in n0_array:
#print(n)
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
# generating a Gaussian time-window with variance sigma
sigma = 200
gauss_win = sg.get_window(("gaussian", sigma), L)
x_segment = gauss_win * x[sel]
Fx_segment = np.fft.fftshift(np.fft.fft(x_segment))
freq = np.fft.fftshift(np.fft.fftfreq(Fx_segment.size, d=1.0 / fs))
plt.figure( figsize=(9,3), dpi= 100, layout="constrained")
plt.plot(freq, np.abs(Fx_segment))
#plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|F(wx)|$")
plt.title("Spectrum of the time segment around $t_0$={0:1.3f} s".format(t[n0]))
plt.grid()
plt.show()
From n0=1000 $(t_0 = 0.023 s)$ to n0=5000 $(t_0 = 0.113 s)$ the original signal is quiet, there is essentially only noise, so that is what we are sampling. This reflects on the fourier transform in such a way that the only peak is the offset (average value) at 0 Hz. Once we go beyond above n0=5000 $(t_0 = 0.113 s)$ the bird's songs starts appearing in the original signal and we start having harmonic patterns similar to the one we obtained when we calculated the fourier transform of the first 32768 samples of the original signal (from $ t_0 = 0 s $ to $ t_0 = 0.743 s $).
The segmented analysis the spectral components of the signal at specific instances in time. In the specific example discussed above, we see that the spectral components of the signal are very different before and after n0=5000 $(t_0 = 0.113 s)$. This is corroborated when looking at the original signal over time.
# Exercise 2.4
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
n0 = 10000
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
# generating a Gaussian time-window with variance sigma
sigma_array = [50, 100, 200]
for sigma in sigma_array:
gauss_win = sg.get_window(("gaussian", sigma), L)
plt.figure( figsize=(8,3), dpi= 100, layout="tight")
#plt.figure()
plt.plot(t[sel], gauss_win)
plt.xlabel("Time [s]")
#v $t_0$={0:1.3f} s".format(t[n0])
plt.ylabel("Window ampl.")
plt.title("$\sigma$ = {0:d} ".format(sigma) )
plt.grid()
plt.show()
x_segment = gauss_win * x[sel]
plt.figure( figsize=(8,3), dpi= 100, layout="tight")
plt.plot(t[sel], x_segment)
plt.xlabel("Time [s]")
plt.ylabel("Ampl. of $wx$")
plt.title("Signal segment around $t_0$={0:1.3f} s with $\sigma$ = {0:d} ".format(sigma) )
plt.grid()
plt.show()
# Exercise 2.5
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
n0 = 10000
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
# generating a Gaussian time-window with variance sigma
sigma_array = [50, 100, 200]
for sigma in sigma_array:
gauss_win = sg.get_window(("gaussian", sigma), L)
x_segment = gauss_win
Fx_segment = np.fft.fftshift(np.fft.fft(x_segment))
freq = np.fft.fftshift(np.fft.fftfreq(Fx_segment.size, d=1.0 / fs))
plt.figure( figsize=(9,3), dpi= 100, layout="constrained")
plt.plot(freq, np.abs(Fx_segment))
#plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|F(wx)|$")
plt.title("Spectrum of the windows with $\sigma$ = {0:1d}".format(sigma) )
plt.grid()
plt.show()
- Based on the Gabor Heisenberg principle, if the signal is concentrated in time it has more uncertainty over its spectral content, conversely, when it is spread over time, we are more certain over its spectral content. This way, we can clearly see that the window with $\sigma = 50$ is much more concentrated in time than the windows with $\sigma = 100$ and $\sigma = 200$, which makes its fourier transform be more spread over frequency.
# Exercise 2.6
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
n0 = 10000
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
# generating a Gaussian time-window with variance sigma
sigma_array = [50, 100, 200]
for sigma in sigma_array:
gauss_win = sg.get_window(("gaussian", sigma), L)
x_segment = gauss_win * x[sel]
Fx_segment = np.fft.fftshift(np.fft.fft(x_segment))
freq = np.fft.fftshift(np.fft.fftfreq(Fx_segment.size, d=1.0 / fs))
plt.figure( figsize=(9,3), dpi= 100, layout="constrained")
plt.plot(freq, np.abs(Fx_segment))
#plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|F(wx)|$")
plt.title("Spectrum of the time segment around $t_0$=0.227 s and $\sigma$ = {0:1d}".format(sigma) )
plt.grid()
plt.show()
Same as question 5, there is a compromise between being precise in time and in frequency according to Gabor Heisenberg principle. $\sigma = 50$ is the more localized in time thus it yields less details in the frequency domain. $\sigma = 200$ is the less localized in time this it yields more details in the frequency domain. $\sigma = 100$ is the middle ground.
We multiply the segment of the signal with the gaussian window before computing the fourier transform in order to obtain spectral information that is more localized in time. If we computed the fourier transform of the whole signal it would be equivalent to a retangular window encompassing the whole signal, which is not well localized in time.
# Exercise 2.8
t = np.arange(0, x.size) / fs # list of times considered
# reference time sample
# corresponds to the time instant t0 = (n0-1)/fs
n0_array = [10000, 15000, 20000]
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# generating a Gaussian time-window with variance sigma
sigma = 200
for n0 in n0_array:
# sample indices of the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
gauss_win = sg.get_window(("gaussian", sigma), L)
x_segment = gauss_win * x[sel]
Fx_segment = np.fft.fftshift(np.fft.fft(x_segment))
if(n0 == n0_array[0]):
total_spectrum = Fx_segment
else :
total_spectrum += Fx_segment
freq = np.fft.fftshift(np.fft.fftfreq(Fx_segment.size, d=1.0 / fs))
plt.figure( figsize=(9,3), dpi= 100, layout="constrained")
plt.plot(freq, np.abs(Fx_segment))
#plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|F(wx)|$")
plt.title("Spectrum of the time segment around $t_0$ = {0:1.3f}".format(t[n0]) )
plt.grid()
plt.show()
# Indication to sum several spectra from successive time segments
# half width of the analysing window
M = 400
# number of samples considered per segment
L = 2 * M + 1
# generating a Gaussian time-window with variance sigma
sigma = 200
gauss_win = sg.get_window(("gaussian", sigma), L)
freq = np.fft.fftshift(np.fft.fftfreq(L, d=1.0 / fs))
# indices of the samples composing the selected segment
n0 = M
sel = np.arange(n0 - M, n0 + M + 1, 1)
x_segment = gauss_win * x[sel]
Xf_sum = np.fft.fftshift(np.fft.fft(x_segment))
# loop over successive segments, spaced by L samples
for n0 in range(M, len(x) - L, M):
# indices of the samples composing the selected segment
sel = np.arange(n0 - M, n0 + M + 1, 1)
x_segment = gauss_win * x[sel]
Xf_sum += np.fft.fftshift(np.fft.fft(x_segment))
abs_Xf_sum = np.abs(Xf_sum)
plt.figure()
plt.plot(freq, abs_Xf_sum)
plt.xlabel("Frequency [Hz]", fontsize=18)
plt.ylabel(r"$| \sum_{q} F (w_q x) |$", fontsize=18)
plt.show()
- The spectrum of the sum of many localized fourier transforms is very similar to the spectrum of the fourier transform of the original segment of signal. This clearly makes sense as the sum of many localized sequential fourier transforms should be equal to the fourier transform of the original signal, which is not localized in time.
Your answers...
fs = 44100
N = x.size
xmin = np.min(x)
xmax = np.max(x)
M = 200
L = 2 * M + 1
sigma = 100
gauss_win = sg.get_window(("gaussian", sigma), L)
freq = np.fft.fftshift(np.fft.fftfreq(L, d=1.0 / fs))
# interactive plot
# DO NOT MODIFY THIS CELL
def local_dft_magnitude(
x: np.ndarray,
xmin: float,
xmax: float,
t: np.ndarray,
freq: np.ndarray,
window,
center: int,
):
L = window.size
half_width = L // 2
select = np.s_[center - half_width : center + half_width + 1]
x_segment = window * x[select]
abs_Fx = np.fft.fftshift(np.abs(np.fft.fft(x_segment)))
fig = plt.figure(figsize=(18, 6), layout="constrained")
gs = GridSpec(3, 2, figure=fig)
ax1 = fig.add_subplot(gs[0, :])
ax2 = fig.add_subplot(gs[1, 0])
ax3 = fig.add_subplot(gs[1, 1])
ax4 = fig.add_subplot(gs[2, :])
ax1.plot(t, x)
ax1.ticklabel_format(style="scientific", axis="y", scilimits=(1, 2))
ax1.vlines(
[t[center - half_width], t[center + half_width]],
xmin,
xmax,
transform=ax1.get_xaxis_transform(),
colors="r",
)
ax1.set_xlabel("Time [s]")
ax1.set_ylabel(r"$x(t)$")
ax1.grid(True)
ax2.plot(t[select], x[select])
ax2.set_ylim(xmin, xmax)
ax2.ticklabel_format(style="scientific", axis="y", scilimits=(1, 2))
ax2.set_xlabel("Time [s]")
ax2.set_ylabel(r"Segment $x(t)$")
ax2.grid(True)
ax3.plot(t[select], x_segment)
ax3.set_ylim(xmin, xmax)
ax3.ticklabel_format(style="scientific", axis="y", scilimits=(1, 2))
ax3.set_xlabel("Time [s]")
ax3.set_ylabel(r"$w_{t_0}(t) x(t)$")
ax3.grid(True)
ax4.plot(freq, abs_Fx)
# ax4.set_ylim(fmin, fmax)
ax4.ticklabel_format(style="scientific", axis="both", scilimits=(1, 2))
ax4.set_ylabel(r"$|F(x w_{t_0})|$")
ax4.set_xlabel("Frequency [Hz]")
ax4.grid(True)
plt.show()
interact(
local_dft_magnitude,
x=fixed(x),
xmin=fixed(xmin),
xmax=fixed(xmax),
t=fixed(t),
freq=fixed(freq),
window=fixed(gauss_win),
center=(M, N - L + 1, 1000),
)
interactive(children=(IntSlider(value=16200, description='center', max=32368, min=200, step=1000), Output()), β¦
<function __main__.local_dft_magnitude(x: numpy.ndarray, xmin: float, xmax: float, t: numpy.ndarray, freq: numpy.ndarray, window, center: int)>
- In this interactive activity we can shift the window of interest of the fourier transform over time. This makes it possible to observe the spectral components of the original signal localized in time. We can clearly see that when the original signal is quiet most of the sampled signal is noise, with a large peak around 0 Hz. When the bird's song is loud the 0 Hz component continues there, but it is overwhelmed by the more energetic harmonic frequencies.
Short-Time Fourier Transform: spectrogram & reconstructionΒΆ
The computation of the Short-Time Fourier Transform (STFT) first consists in dividing the time signal into short, overlapping segments of equal length. A Fourier transform is then computed separately for each of the extracted segments. The STFT gathers local Fourier transforms (vertical frequency axis) calculated on regularly spaced time windows (horizontal time axis) so as to deduce the spectrogram, a 2D visual heat map. It is defined as the squared modulus of the STFT, interpreted as a time-frequency density of energy.
In general, windows overlap in time, which leads to a certain redundancy enhancing the readability of the time-frequency information. This redundant representation can be reversed (pseudo-inverse) to reconstruct the original signal. One can show that the energy is preserved from the time to the frequency domain (energy conservation).
The parameters to compute a spectrogram parameters are:
- the shape of the analysis window;
- the window size;
- the time spacing between 2 successive windows, denoted
spacinghere, which also conditions the amount of overlap between consecutive windows.
The spectrogram function directly provides the squared magnitude of the Short-Time Fourier Transform (STFT). The vertical axis corresponds to the positive frequencies between $0$ and $ \frac{f_s}{2}$, and the horizontal axis to temporal samples.
A code example is given below.
fs, x = scwav.read("sounds/bird.wav")
# length of the extracted eportion of the signal
N = 14000
x = x[0:N]
t = np.arange(0, N) / fs
Audio(x, rate=fs)
Computing the STFT transform of the signal:
# window width (usualy taken as a power of 2)
width = 1024
# number of samples between 2 successive segments
spacing = width / 8
#spacing = width
# window used
window = "hann"
# computing the spectrogram
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs, window="hann", nperseg=width, noverlap=width - spacing
)
Graphical representation of the analysis window
w = sg.get_window("hann", width)
plt.plot(w)
plt.grid()
plt.xlabel("Sample index $n$")
plt.ylabel(r"$w[n]$")
plt.title("Hann window (width={0})".format(width))
plt.show()
Graphical representation of the spectrogram
plt.figure(figsize=(10, 10))
# signal in the time domain
plt.subplot(211)
plt.plot(t, x)
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.grid()
# associated spectrogram
plt.subplot(212)
ax = plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
Exercice 4 (β)ΒΆ
How should the axes of the spectrogram be read and interpreted? Justify the range of frequencies covered. Specify how the colormap should be interpreted.
Specify how the horizontal and vertical axes of the spectrogram are indexed and discretized. Express the number of elements along each axis in function of the window size $L$, the overlap size $S$ and the number of samples $N$ is the signal.
Plot the spectrograms of the
glockenspiel_mono.wavanddesactive_mono.wavsounds. provided in thesoundsfolder. Observe and comment.The
spectrogramfunctions requires an optional keyword parameterwindow. Try two other windows (see associated documentation online) and briefly compare the resulting spectrograms.
Answers to Exercise 4ΒΆ
The spectogram shows the repartition of the energy of the signal according to time and frequency. The yellower it gets, the more energy there is for the (t,f) associated with this coordinate.
The vertical resolution of the signal depends only on the window size $L$, as the STFT obtains the fourier representation as an array the ob same length as its time representation. We can expres this resolution as $r_V = \frac{L}{2} + 1 $, where $r_V$ is the length of the array that contains the frequency component of the spectrogram. It is divided by 2 because we are only concerned with positive frequencies on the spectrogram. The horizontal resolution depends on the number of samples $N$, window size $L$ and the overlap size $S$ as $r_H = \frac{N}{L-S} $. Here $r_H$ represents the number of points for which the STFT is calculated and it is clearly the total number of points divided by the "effective" lenght of a single point. It is important to keep in mind that S here is the overlap size, which is very different from the spacing size used in the code.
# Exercuse 4.3
fs, x = scwav.read("sounds/glockenspiel_mono.wav")
# length of the extracted eportion of the signal
N = 14000
x = x[0:N]
t = np.arange(0, N) / fs
Audio(x, rate=fs)
# window width (usualy taken as a power of 2)
width = 1024
# number of samples between 2 successive segments
spacing = width / 8
#spacing = width
# window used
window = "hann"
# computing the spectrogram
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs, window="hann", nperseg=width, noverlap=width - spacing
)
plt.figure(figsize=(10, 10))
# signal in the time domain
plt.subplot(211)
plt.plot(t, x)
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.grid()
# associated spectrogram
plt.subplot(212)
ax = plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
- For the
glockenspielwe can clearly see that the signal can be divided into 3 very distict sections in time. Each of these sections has its own spectral component which is reasonably constant within. These 3 distinct sections correspond probably to 3 different musical notes.
# Exercise 4.3
fs, x = scwav.read("sounds/desactive_mono.wav")
# length of the extracted eportion of the signal
N = 14000
x = x[0:N]
t = np.arange(0, N) / fs
Audio(x, rate=fs)
# window width (usualy taken as a power of 2)
width = 1024
# number of samples between 2 successive segments
spacing = width / 8
#spacing = width
# window used
window = "hann"
# computing the spectrogram
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs, window="hann", nperseg=width, noverlap=width - spacing
)
plt.figure(figsize=(10, 10))
# signal in the time domain
plt.subplot(211)
plt.plot(t, x)
plt.ylabel("Amplitude")
plt.xlabel("Time [s]")
plt.grid()
# associated spectrogram
plt.subplot(212)
ax = plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
- For the
desactivea similar analysis can be made, but now instead of dividing the signal into musical notes in time, it makes sense dividing it into syllables. "DΓ©sactivΓ©" has 4 syllables, which can clearly be spotted on the grath over time and on the spectrogram. It is also interesting to remark that the spectrum within a syllable is also approximately constant.
# Exercise 4.4
fs, x = scwav.read("sounds/glockenspiel_mono.wav")
# length of the extracted eportion of the signal
N = 14000
x = x[0:N]
t = np.arange(0, N) / fs
Audio(x, rate=fs)
# window width (usualy taken as a power of 2)
width = 1024
# number of samples between 2 successive segments
spacing = width / 8
#spacing = width
# window used
#window = "hann"
array_window = ["hann", "triang", "boxcar"]
for window in array_window:
# computing the spectrogram
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs, window=window, nperseg=width, noverlap=width - spacing
)
plt.figure(figsize=(10, 10))
# associated spectrogram
plt.subplot(212)
ax = plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.title(window)
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
- The spectrogram with the
triangis very similar to thehann, however, theboxcaris clearly much different. The spectrogram withboxcarseems to have many artifacts due to sampling. This can be explained as its fourier transform is the sinc function, which gets suppressed very slowly over the frequencies that are not concerned with it.
Exercise 5: time-frequency analysis of sine waves (β)ΒΆ
Consider the signal x stored in the signal_2sinus.mat file loaded below, sampled at $f_s =4096$ Hz.
Study this signal and its spectrogram with a window of your choice. To this aim, you may observe the spectrogram for different windows and widths, for instance
width=64,128, 256, 512, 1024. Which choice of window and width allows the 2 frequencies present in this signal to be properly distinguished? Briefly justify your answer.What is the content of this signal ? Try to be as precise as possible in your description, including characteristic frequencies. Briefly comment and explain your observations.
# Load signal x
# load the signal x and the times instants associated with the samples
data = scio.loadmat("sounds/signal_2sinus.mat")
t = data["t"][0]
x = data["x"][0]
fs = data["Fe"][0][0]
plt.figure()
plt.plot(t, x)
plt.xlabel(r"$t$ [s]")
plt.ylabel(r"$x(t)$")
plt.show()
Audio(x, rate=fs)
width = 1024
window = sg.get_window("hann", width)
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs=fs, window=window, nperseg=width, noverlap=7 * width / 8
)
# the top of the spectrogram corresponds to the high frequencies
plt.figure(figsize=(12.7, 6))
plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
# exercise 5.1
array_width = [64, 128, 256, 512, 1024]
for width in array_width :
#window = sg.get_window("hann", width)
window = sg.get_window("blackman", width)
f_Sx, t_Sx, Sx = sg.spectrogram(
x, fs=fs, window=window, nperseg=width, noverlap=7 * width / 8
)
# the top of the spectrogram corresponds to the high frequencies
plt.figure(figsize=(11, 4))
plt.pcolormesh(t_Sx, f_Sx, Sx, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.title("width : {0:1d}".format(width) )
plt.grid()
plt.colorbar()
plt.show()
Fx = np.fft.fft(x)
fs = 4096
N = 4096
# list of frequencies associated with the DFT
freq = np.arange(0, N) * (fs / N)
max = 426
min = 380
plt.figure()
plt.plot(freq[min:max], np.abs(Fx[min:max]))
plt.xlabel("Frequency [Hz]")
plt.ylabel(r"$|Fx|$")
plt.grid()
plt.show()
- and 2. For all window type tested, only at the width of 1024 both frequencies of the signal can be clrearly differentiated. The Figure above clearly shows that the signal is primarily composed of 2 sinusoidal waves with frequencies of 400Hz and 425Hz, so, there is 25Hz of difference between them. In order to clearly differentiate them, it is necessary to have at least 1 point in between, so we need a resolution smaller than 12.5Hz. Using the equation $r_V = \frac{L}{2} + 1 $ we have that for L=1024 the number of points where the spectrum is calculated is of 513, considering the maximum frequency of 4096, the resolution is approximately 8Hz, which is smalled than 12.5Hz as desided. If the same calculation is made for L=512, the resolution reached is 16Hz, which is greather than 12.5Hz, so it is not enough to differentiate both frequencies.
[Bonus] Exercice 6: time-frequency analysis of hyperbolic chirps (β)ΒΆ
We consider the superposition of 2 "chirps", i.e., signals whose frequency varies as an hyperbolic function of time (see the help of the chirp function in Python). As a result, we face a resolution problem, since the width of the window is constant for the entire signal, and may thus not be suitable for all the time instants.
The ultrasonic cries produced by bats typically have a similar structure, as well as gravitational waves observed in astrophysics.
fs = 4096
T = 3
f1_min, f1_max = 200, 1600
f2_min, f2_max = 250, 1650
t = np.linspace(0, 3, fs * T)
y1 = sg.chirp(t, f1_min, T, f2_max, method="hyperbolic")
y2 = sg.chirp(t, f2_min, T, f2_max, method="hyperbolic")
y = y1 + y2
Audio(y, rate=fs)
width = 256 #better for t=2s
#width = 2048 #better for t=0s
window = sg.get_window("hann", width)
f_Sy, t_Sy, Sy = sg.spectrogram(
y, fs=fs, window=window, nperseg=width, noverlap=7 * width / 8
)
plt.figure()
plt.pcolormesh(t_Sy, f_Sy, Sy, norm=LogNorm())
plt.ylabel("Frequency [Hz]")
plt.xlabel("Time [s]")
plt.grid()
plt.colorbar()
plt.show()
QuestionsΒΆ
- Observe the results obtained by the above analysis. What is the nature of this signal?
- Observe the quality of the spectrogram using different windows. Modify the width of the window, and highlight situations where the spectrogram allows or doesn't allow the different components of the signal to be distinguished.